Author Details

The Search Engine process has become the most reliable in the research for any domain. As people believe that the extraction of their search is highly reliable and they follow according to it. The most reliable search is the Google. As, in this project we are getting the search in the medical domain. It has to be taken an extra step in for the research of our project. The empirical domain of automatic learning is used in tasks such as medical decision support, medical imaging, protein-protein interaction, extraction of medical knowledge, and for overall patient management care. Machine Learning field has gained its momentum in almost any domain of research and just recently has become a reliable tool in the medical domain. It is envisioned as a tool by which computer-based systems can be integrated in the healthcare field in order to get a better, more efficient medical care. In this project, we provide the users all their needs about the Disease treatment relation such as Cure, Prevention, Side-effects, Symptoms, Medicine and the Doctors. When the user is in need of their health, they can get the immediate access to patient diagnoses, allergies, and lab test results that enable better and time-efficient medical decisions. Our evaluation results for these tasks show that the proposed methodology obtains reliable outcomes that could be integrated in an application to be used in the medical care domain. And we prove that the search has gained more impact and outperformed the existing project.

Full Text

Decision Trees for Uncertain Data

Abstract Views :169 | PDF Views:2

Authors

A. Pandian ¹, J. Venkata Subramanian ¹, S. Balamanikandan ²

Affiliations
1 Department of MCA, SRM University, Chennai, IN
2 Thiruvalluvar University, Kallakurichi, IN

Source

Data Mining and Knowledge Engineering, Vol 4, No 3 (2012), Pagination: 123-128

Abstract

Classification based on decision trees is one of the important problems in data mining and has applications in many fields. In recent years, database systems have become highly distributed, and distributed system paradigms such as federated and peer-to-peer databases are being adopted. In this paper, we consider the problem of inducing decision trees in a large distributed network of high dimensional databases. Our work is motivated by the existence of distributed databases in healthcare and in bioinformatics, and by the vision that these databases are soon to contain large amounts of genomic data, characterized by its high dimensionality. Current decision tree algorithms would require high communication bandwidth when executed on such data, which is not likely to exist in large-scale distributed systems. We present an algorithm that sharply reduces the communication overhead by sending just a fraction of the statistical data. A fraction which is nevertheless sufficient to derive the exact same decision tree learned by a sequential learner on all the data in the network. Value uncertainty arises in many applications during the data collection process. Example sources of uncertainty include measurement/quantization errors, data staleness, and multiple repeated measurements. With uncertainty, the value of a data item is often represented not by one single value, but by multiple values forming a probability distribution. Rather than abstracting uncertain data by statistical derivatives (such as mean and median), we discover that the accuracy of a decision tree classifier can be much improved if the "complete information" of a data item (taking into account the probability density function (pdf)) is utilized. We extend classical decision tree building algorithms to handle data tuples with uncertain values. Extensive experiments have been conducted that show that the resulting classifiers are more accurate than those using value averages. Since processing pdf's is computationally more costly than processing single values.

Keywords

Data Mining Distributed Algorithms, Decision Trees, Classification, High Dimension Data.

Full Text

Efficient and Accurate Discovery of Patterns in Sequence Datasets

Abstract Views :160 | PDF Views:2

Authors

A. Pandian ¹, J. Venkatasubramanian ¹, S. E. Chandiran ²

Affiliations
1 Dept of MCA, SRM University, Chennai, IN
2 SRM University, Chennai, IN

Source

Data Mining and Knowledge Engineering, Vol 4, No 3 (2012), Pagination: 139-144

Abstract

Existing sequence mining algorithms mostly focus on mining for subsequences. However, a large class of applications, such as biological DNA and protein motif mining, require efficient mining of “approximate” patterns that are contiguous. The few existing algorithms that can be applied to find such contiguous approximate pattern mining have drawbacks like poor scalability, lack of guarantees in finding the pattern, and difficulty in adapting to other applications. In this paper, we present a new algorithm called FLAME (FLexible and Accurate Motif Detector). FLAME is a flexible suffix tree based algorithm that can be used to find frequent patterns with a variety of definitions of motif (pattern) models. It is also accurate, as it always find the pattern if it exists. Using both real and synthetic datasets, we demonstrate that FLAME is fast, scalable, and outperforms existing algorithms on a variety of performance metrics. Using FLAME, it is now possible to mine datasets that would have been prohibitively difficult with existing tools.

Keywords

FLAME, Data Mining, Distributed Algorithms, Dataset, Decision Trees, Classification.

Username
Password
Remember me